116 research outputs found

    Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

    Full text link
    NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.Comment: Accepted at EMNLP 2017. Please include EMNLP in any citations. Minor changes from the EMNLP camera-ready version. 9 pages + references and supplementary materia

    Online social networks: Measurement, analysis, and applications to distributed information systems

    Get PDF
    Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing networks like the Web. For example, in the Web, hyperlinks between content form a graph that is used to organize, navigate, and rank information. The properties of the Web graph have been studied extensively, and have lead to useful algorithms such as PageRank. In contrast, few links exist between content in online social networks and instead, the links exist between content and users, and between users themselves. However, little is known in the research community about the properties of online social network graphs at scale, the factors that shape their structure, or the ways they can be leveraged in information systems. In this thesis, we use novel measurement techniques to study online social networks at scale, and use the resulting insights to design innovative new information systems. First, we examine the structure and growth patterns of online social networks, focusing on how users are connecting to one another. We conduct the first large-scale measurement study of multiple online social networks at scale, capturing information about over 50 million users and 400 million links. Our analysis identifies a common structure across multiple networks, characterizes the underlying processes that are shaping the network structure, and exposes the rich community structure. Second, we leverage our understanding of the properties of online social networks to design new information systems. Specifically, we build two distinct applications that leverage different properties of online social networks. We present and evaluate Ostra, a novel system for preventing unwanted communication that leverages the difficulty in establishing and maintaining relationships in social networks. We also present, deploy, and evaluate PeerSpective, a system for enhancing Web search using the natural community, structure in social networks. Each of these systems has been evaluated on data from real online social networks or in a deployment with real users

    Providing Administrative Control and Autonomy in Structured Peer-to-Peer Overlays

    Get PDF
    self-organizing substrate for distributed applications and support powerful abstractions such as distributed hash tables (DHTs) and group communication. However, in most of these systems, lack of control over key placement and routing paths raises concerns over autonomy, administrative control and accountability of participating organizations. Additionally, structured p2p overlays tend to assume global connectivity while in reality, network address translation and firewalls limit connectivity among hosts in different organizations. In this paper, we present a general technique that ensures content/path locality and administrative autonomy for participating organizations, and provides natural support for NATs and firewalls. Instances of conventional structured overlays are configured to form a hierarchy of identifier spaces that reflects administrative boundaries and respects connectivity constraints among networks

    Algorithms that "Don't See Color": Comparing Biases in Lookalike and Special Ad Audiences

    Full text link
    Today, algorithmic models are shaping important decisions in domains such as credit, employment, or criminal justice. At the same time, these algorithms have been shown to have discriminatory effects. Some organizations have tried to mitigate these effects by removing demographic features from an algorithm's inputs. If an algorithm is not provided with a feature, one might think, then its outputs should not discriminate with respect to that feature. This may not be true, however, when there are other correlated features. In this paper, we explore the limits of this approach using a unique opportunity created by a lawsuit settlement concerning discrimination on Facebook's advertising platform. Facebook agreed to modify its Lookalike Audiences tool - which creates target sets of users for ads by identifying users who share "common qualities" with users in a source audience provided by an advertiser - by removing certain demographic features as inputs to its algorithm. The modified tool, Special Ad Audiences, is intended to reduce the potential for discrimination in target audiences. We create a series of Lookalike and Special Ad audiences based on biased source audiences - i.e., source audiences that have known skew along the lines of gender, age, race, and political leanings. We show that the resulting Lookalike and Special Ad audiences both reflect these biases, despite the fact that Special Ad Audiences algorithm is not provided with the features along which our source audiences are skewed. More broadly, we provide experimental proof that removing demographic features from a real-world algorithmic system's inputs can fail to prevent biased outputs. Organizations using algorithms to mediate access to life opportunities should consider other approaches to mitigating discriminatory effects

    Understanding and Specifying Social Access Control Lists

    Get PDF
    Online social network (OSN) users upload millions of pieces of contenttoshare with otherseveryday. While asignificant portionofthiscontentis benign(andistypicallysharedwith all friends or all OSN users), there are certain pieces of content that are highly privacy sensitive. Sharing such sensitive content raises significant privacy concerns for users, and it becomes important for the user to protect this content from being exposed to the wrong audience. Today, most OSN services provide fine-grained mechanisms for specifying social access control lists (social ACLs, or SACLs), allowing users to restrict their sensitive content to a select subset of their friends. However, it remains unclear how these SACL mechanisms are used today. To design better privacy management tools for users, we need to first understand the usage and complexity of SACLs specified by users. In this paper, we present the first large-scale study of finegrained privacy preferences of over 1,000 users on Facebook, providing us with the first ground-truth information on how users specify SACLs on a social networking service. Overall, we find that a surprisingly large fraction (17.6%) of content is shared with SACLs. However, we also find that the SACL membership shows little correlation with either profile information or social network links; as a result, it is difficult to predict the subset of a user’s friends likely to appear in a SACL. On the flip side, we find that SACLs are often reused, suggesting that simply making recent SACLs available to users is likely tosignificantly reduce the burdenof privacy management on users. 1

    Problematic Advertising and its Disparate Exposure on Facebook

    Full text link
    Targeted advertising remains an important part of the free web browsing experience, where advertisers' targeting and personalization algorithms together find the most relevant audience for millions of ads every day. However, given the wide use of advertising, this also enables using ads as a vehicle for problematic content, such as scams or clickbait. Recent work that explores people's sentiments toward online ads, and the impacts of these ads on people's online experiences, has found evidence that online ads can indeed be problematic. Further, there is the potential for personalization to aid the delivery of such ads, even when the advertiser targets with low specificity. In this paper, we study Facebook -- one of the internet's largest ad platforms -- and investigate key gaps in our understanding of problematic online advertising: (a) What categories of ads do people find problematic? (b) Are there disparities in the distribution of problematic ads to viewers? and if so, (c) Who is responsible -- advertisers or advertising platforms? To answer these questions, we empirically measure a diverse sample of user experiences with Facebook ads via a 3-month longitudinal panel. We categorize over 32,000 ads collected from this panel (n=132n=132); and survey participants' sentiments toward their own ads to identify four categories of problematic ads. Statistically modeling the distribution of problematic ads across demographics, we find that older people and minority groups are especially likely to be shown such ads. Further, given that 22% of problematic ads had no specific targeting from advertisers, we infer that ad delivery algorithms (advertising platforms themselves) played a significant role in the biased distribution of these ads.Comment: Accepted to USENIX Security 202

    Understanding the Role of Registrars in DNSSEC Deployment

    Get PDF
    The Domain Name System (DNS) provides a scalable, flexible name resolution service. Unfortunately, its unauthenticated architecture has become the basis for many security attacks. To address this, DNS Security Extensions (DNSSEC) were introduced in 1997. DNSSEC’s deployment requires support from the top-level domain (TLD) registries and registrars, as well as participation by the organization that serves as the DNS operator. Unfortunately, DNSSEC has seen poor deployment thus far: despite being proposed nearly two decades ago, only 1% of .com, .net, and .org domains are properly signed. In this paper, we investigate the underlying reasons why DNSSEC adoption has been remarkably slow. We focus on registrars, as most TLD registries already support DNSSEC and registrars often serve as DNS operators for their customers. Our study uses large-scale, longitudinal DNS measurements to study DNSSEC adoption, coupled with experiences collected by trying to deploy DNSSEC on domains we purchased from leading domain name registrars and resellers. Overall, we find that a select few registrars are responsible for the (small) DNSSEC deployment today, and that many leading registrars do not support DNSSEC at all, or require customers to take cumbersome steps to deploy DNSSEC. Further frustrating deployment, many of the mechanisms for conveying DNSSEC information to registrars are error-prone or present security vulnerabilities. Finally, we find that using DNSSEC with third-party DNS operators such as Cloudflare requires the domain owner to take a number of steps that 40% of domain owners do not complete. Having identified several operational challenges for full DNSSEC deployment, we make recommendations to improve adoption
    • …
    corecore